Introduction to R: The Basics of R

Xiru Lyu

Consulting for Statistics, Computing & Analytics Research (CSCAR), University of Michigan

2022-11-30

Welcome! 👋

  • Before we start our session today, please use the link https://bit.ly/cscar-r-basics to download materials for today’s workshop.
  • The above link was shared with you via email (sent by xlyu@umich.edu).
    • If you signed up for this workshop after Nov 21th, you may not have received this email. Please send me an email (xlyu@umich.edu) and I can share the link with you.

About CSCAR

  • CSCAR provides individualized guidance and training to U-M researchers (faculty, staff, graduate students) on data collection, management, and analysis. The team also supports the use of statistical software and advanced computing.

    • Schedule an appointment with a staff member
    • Sign up for a consultation with a graduate student
    • Appointments can be either remote or in-person: Suite 3560, Rackham Building
  • We hold workshops on a variety of statistical topics.

  • CSCAR statisticians are available for hiring.

  • Email us with your stats questions!

    • stats-consulting@umich.edu
    • ds-consulting@umich.edu

For more information, visit https://cscar.research.umich.edu.

Acknowledgement

Today’s workshop was inspired by the R workshop taught by Chris Andrew.

Agenda

  1. The R environment
  2. Using R scripts
  3. Getting started with R
    • Data types

    • Data structures

  4. Break time
  5. Creating R objects
  6. Programming in R
    • Conditional if/else statements

    • For/while loops

    • Functions

The R environment

About R

  • R is a programming language and an environment for statistical computing and graphics.

Why using R?

  • Statistical tools

  • Graphics

  • Extensibility

  • Reproducibility

  • FREE!

RStudio

RStudio in an integrated development environment (IDE) for R.

Download R and RStudio

Using R scripts

A sample R script

RStudio Shortcuts

  • Run the code

    • Mac: cmd + return
    • Windows: ctrl + enter
  • Create a comment

    • Mac: cmd + shift + return
    • Windows: ctrl + shift + enter

Getting started with R

R as a calculator

2_basic_calculation.R

5 + 3 
## [1] 8
5 - 3 
## [1] 2
5 / 3 
## [1] 1.666667
5 * 3 
## [1] 15

17 %% 5 # remainder
## [1] 2
17 %/% 5 # integer division
## [1] 3

5 * 2 + 3 # the order of operation is from left to right
## [1] 13
5 * (2 + 3) # change the order of operation by parentheses
## [1] 25

R as a calculator (cont.)

5 ^ 3
## [1] 125

sqrt(5)
## [1] 2.236068

exp(10)
## [1] 22026.47

log(100) # natural log
## [1] 4.60517

Exercise: How to compute logarithms with a different base?

Answer
log10(100) # log with base 10
log(10, 10)
log(100, base=10)
log(x=100, base=10)

R as a calculator (cont.)

round(exp(1), 3)
## [1] 2.718

round(exp(1)) # default for the number of decimal places is 0
## [1] 3

floor(3.1415926)
## [1] 3

ceiling(3.1415926)
## [1] 4

Data types

1_sample_script.R

  • double: 2, 1.5

  • integer: 2L

  • character: "abc", "1"

  • logical: TRUE, FALSE

    • abbreviations: T, F
  • missing value: NA

    • NA_real_, NA_integer_, NA_character_, NA

Data types: logical

5 == 6
## [1] FALSE
5 != 6
## [1] TRUE

5 < 6
## [1] TRUE
5 > 6
## [1] FALSE

5 <= 6
## [1] TRUE
5 >= 6
## [1] FALSE

5 < Inf
## [1] TRUE
5 < -Inf
## [1] FALSE

!TRUE
## [1] FALSE

Data types: missing value

Missing values tend to be infectious – most operations involving a missing value will return another missing value.

NA > 5
## [1] NA

10 * NA
## [1] NA

!NA
## [1] NA

Data types: missing value (cont.)

Of course, there are some exceptions..

NA ^ 0
## [1] 1

NA | TRUE
## [1] TRUE

NA & FALSE
## [1] FALSE

Convert data types

as.numeric("30")
## [1] 30

as.character(30)
## [1] "30"

as.numeric(TRUE)
## [1] 1

as.numeric(FALSE)
## [1] 0

Data structures: vectors

  • atomic vectors

    • homogeneous: can only hold one data type
  • lists

    • heterogeneous: can hold multiple data types

Data structures: atomic vectors

c(1, 2, 3)
## [1] 1 2 3

c(1, NA, 3)
## [1]  1 NA  3

c("abc", "1")
## [1] "abc" "1"

Data structures: lists

list(1, "a", 1L, TRUE)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] "a"
## 
## [[3]]
## [1] 1
## 
## [[4]]
## [1] TRUE


list(c(1, 2, 3), c("a", "b", "c"))
## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1] "a" "b" "c"

Data structures: atomic vectors (cont.)

Exercise: What if we mix data types within an atomic vector?

c(1, "2", TRUE, 1L, 4.5, NA)
## [1] "1"    "2"    "TRUE" "1"    "4.5"  NA


c(1, TRUE, FALSE)
## [1] 1 1 0

coercion

Data values are coerced in a fixed order:

character \(\leftarrow\) double \(\leftarrow\) integer \(\leftarrow\) logical

Data structures: matrices and arrays

  • A matrix or an array is a vector with a dimension attribute.

    • Attribute assigns metadata to the vector
  • A matrix has 2 dimensions; an array can have any number of dimensions

Data structures: matrices

matrix(1:12, nrow = 4, ncol = 3)
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12

matrix(1:12, nrow = 4)
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12

matrix(1:12, ncol = 3)
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12
matrix(1:12, nrow = 5)
## Warning in matrix(1:12, nrow = 5): data length [12] is not a sub-multiple or
## multiple of the number of rows [5]
##      [,1] [,2] [,3]
## [1,]    1    6   11
## [2,]    2    7   12
## [3,]    3    8    1
## [4,]    4    9    2
## [5,]    5   10    3

Data structures: matrices (cont.)

matrix(1:12, nrow = 4)
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12

Exercise: How to create a matrix with values filled by row instead?

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
[4,]   10   11   12
Answer
matrix(1:12, nrow = 4, byrow=TRUE)

Data structures: arrays

array(1:12, dim=c(2,3,2))
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12
array(1:12, dim=c(2,3,3))
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]    7    9   11
## [2,]    8   10   12
## 
## , , 3
## 
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

Data structures: data frames

  • A data frame is a named list, with the constraint that elements must have the same length.
  • Given this rectangular structure, data frames have 2 dimensions and share properties of both matrices and lists.
data.frame(id = c(1, 2, 3), 
           name = c("James", "Kim", "Lisa"), 
           student = c(TRUE, FALSE, TRUE))
##   id  name student
## 1  1 James    TRUE
## 2  2   Kim   FALSE
## 3  3  Lisa    TRUE

Creating R objects

Assignment

3_objects.R

x <- 5
x = 5 # I use this
5 -> x # not recommended

x
## [1] 5

print(x)
## [1] 5

x.y = 5
x_y = 10

a = 5
A = 7

print(c(a, A))
## [1] 5 7
  • An object is an entity that contains information and can be manipulated by commands.

  • Use an object name that’s informative.

    • R is case sensitive.

    • Use - or . as separators in the object name.

    • The object name should not start with a number.

Manipulation

3_objects.R

x = 5
x + 2
## [1] 7

x = x + 2
x
## [1] 7

y = 3
x - y 
## [1] 4

x == y # test of equality
## [1] FALSE
x = y # assignment
print(c(x, y))
## [1] 3 3

x = c(1, 2, 3)
x
## [1] 1 2 3

rm(x) # remove an object from the environment
x
## Error in eval(expr, envir, enclos): object 'x' not found

Exercise: creating some objects

In 3_objects.R, follow instructions and create the following objects:

  1. a vector named vec with values 1, 5, 6, 9, 0
Answer
vec = c(1, 5, 6, 9, 0)
  1. a matrix named mat with 3 columns and 4 rows, using integer values from -4 to 7. Values shall be filled by row.
Answer
mat = matrix(-4:7, nrow = 4, byrow = TRUE)
mat = matrix(-4:7, ncol = 3, byrow = TRUE)
mat = matrix(-4:7, nrow = 4, ncol = 3, byrow = TRUE)
  1. a list named ls with the above objects vec and mat as its elements
Answer
ls = list(vec, mat)

Exercise: creating some objects (cont.)

  1. a data frame named df with four named columns –

    • city: Ann Arbor, Boston, Atlanta
    • state: MI, MA, GA
    • lat: 42.278046, 42.361145, 33.753746
    • lng: -83.738220, -71.057083, -84.386330
Answer
df = data.frame(city = c("Ann Arbor", "Boston", "Atlanta"),
                state = c("MI", "MA", "GA"),
                lat = c(42.278046, 42.361145, 33.753746),
                lng = c(-83.738220, -71.057083, -84.386330))

Exploring data structures

class()

returns the (high-level) type of object

class(vec)
## [1] "numeric"
class(mat)
## [1] "matrix" "array"
class(ls)
## [1] "list"
class(df)
## [1] "data.frame"

type()

returns the (low-level) type of object

typeof(vec)
## [1] "double"
typeof(mat)
## [1] "integer"
typeof(ls)
## [1] "list"
typeof(df)
## [1] "list"

Exploring data structures (cont.)

attributes()

returns the metadata (if any) associated with the object

attributes(vec)
## NULL
attributes(ls)
## NULL

attributes(mat)
## $dim
## [1] 4 3
attributes(df)
## $names
## [1] "city"  "state" "lat"   "lng"  
## 
## $class
## [1] "data.frame"
## 
## $row.names
## [1] 1 2 3

length()

returns the length of the object

# the function works better 
# with 1-d objects

length(vec)
## [1] 5
length(ls)
## [1] 2


length(mat)
## [1] 12
length(df)
## [1] 4

Exploring data structures (cont.)

dim()

returns the dimension of a multi-dimensional object

dim(mat)
## [1] 4 3
dim(df)
## [1] 3 4

nrow(), ncol()

nrow(mat)
## [1] 4
ncol(mat)
## [1] 3

nrow(df)
## [1] 3
ncol(df)
## [1] 4

Subsetting objects: atomic vectors

vec
## [1] 1 5 6 9 0

# subset by index
vec[1]
## [1] 1

# assign the subsetted value to another object
sub_vec = vec[2]
sub_vec
## [1] 5

# subset elements in any order
vec[c(1,3)]
## [1] 1 6
vec[c(3,1)]
## [1] 6 1

# elements can be subsetted any number of times
vec[c(3,1,5,3,5)]
## [1] 6 1 0 6 0

# extract all elements in the vector except the second
vec[-2]
## [1] 1 6 9 0

Subsetting objects: lists

ls
## [[1]]
## [1] 1 5 6 9 0
## 
## [[2]]
##      [,1] [,2] [,3]
## [1,]   -4   -3   -2
## [2,]   -1    0    1
## [3,]    2    3    4
## [4,]    5    6    7

ls[[1]]
## [1] 1 5 6 9 0

ls[[1]][2]
## [1] 5

Subsetting objects: matrices

mat
##      [,1] [,2] [,3]
## [1,]   -4   -3   -2
## [2,]   -1    0    1
## [3,]    2    3    4
## [4,]    5    6    7

mat[2,3]
## [1] 1

mat[c(1,2), 3]
## [1] -2  1

mat[1:2, 1:3]
##      [,1] [,2] [,3]
## [1,]   -4   -3   -2
## [2,]   -1    0    1

mat[2,]
## [1] -1  0  1

Exercise: subsetting the matrix

  • Use the minus sign, extract all elements of the matrix mat except the third column
     [,1] [,2]
[1,]   -4   -3
[2,]   -1    0
[3,]    2    3
[4,]    5    6
Answer
mat[,-3]
  • Reorder rows for mat so the first row becomes the last, and the last row is the first
     [,1] [,2] [,3]
[1,]    5    6    7
[2,]   -1    0    1
[3,]    2    3    4
[4,]   -4   -3   -2
Answer
rbind(mat[4,], mat[2:3,], mat[1,])

Subsetting objects: data frames

df
##        city state      lat       lng
## 1 Ann Arbor    MI 42.27805 -83.73822
## 2    Boston    MA 42.36115 -71.05708
## 3   Atlanta    GA 33.75375 -84.38633

df[,1]
## [1] "Ann Arbor" "Boston"    "Atlanta"

df[2,1]
## [1] "Boston"

df$city
## [1] "Ann Arbor" "Boston"    "Atlanta"

df[df$city == "Ann Arbor",]
##        city state      lat       lng
## 1 Ann Arbor    MI 42.27805 -83.73822

df$lat[df$city == "Ann Arbor"]
## [1] 42.27805

Subsetting objects: which()

  • The which() function takes a logical statement as an argument, and returns indices for which the statement is true.
vec
## [1] 1 5 6 9 0
which(vec == 5)
## [1] 2

df
##        city state      lat       lng
## 1 Ann Arbor    MI 42.27805 -83.73822
## 2    Boston    MA 42.36115 -71.05708
## 3   Atlanta    GA 33.75375 -84.38633
which(df$city == "Ann Arbor")
## [1] 1

Index assignment

vec
## [1] 1 5 6 9 0

vec[3] = 55

vec
## [1]  1  5 55  9  0
mat
##      [,1] [,2] [,3]
## [1,]   -4   -3   -2
## [2,]   -1    0    1
## [3,]    2    3    4
## [4,]    5    6    7

mat[1:2, 3] = 0

mat
##      [,1] [,2] [,3]
## [1,]   -4   -3    0
## [2,]   -1    0    0
## [3,]    2    3    4
## [4,]    5    6    7

Programming in R

if...else statement

  • The if...else is a conditional statement.
  • The if statement can be followed by an optional else statement.

https://www.datamentor.io/r-programming/if-else-statement/

if...else statement (cont.)

4_ifelse.R

1a.

if (test expression A is true) {
  execute command A
}

2a.

if (test expression A is true) {
  execute command A
} else {
  execute command B
}

1b.

if (x > 0) {
  print("positive number")
}

2b.

if (x > 0) {
  print("positive number")
} else {
  print("non-positive number")
}

if...else if...else statement

if (test expression A is true) {
execute command A 
} else if (test expression B is true) {
execute command B
}

# you can insert as many else if 
# statements as you want
... 

else if (test expression Y is true) {
  execute command Y
} else {
execute command Z
} # the else statement is optional

Exercise: Use the syntax provided on the left, write a if...else if...else statement that performs the following operation –

  • if x > 0, print positive number
  • if x = 0, print zero
  • if x < 0, print negative number

if...else if...else statement

Answer
if (x > 0) {
  print("positive number")
} else if (x == 0) {
  print("zero")
} else {
  print("negative number")
}


if (x > 0) {
  print("positive number")
} else if (x == 0) {
  print("zero")
} else if (x < 0) {
  print("negative number")
}

Exercise: Use the syntax provided on the left, write an if...else if...else statement that performs the following operation –

  • if x > 0, print positive number
  • if x = 0, print zero
  • if x < 0, print negative number

for loops

5_for_while_loops.R

  • This loop is used for iterating over a sequence.
for (value in sequence) {
  execute the command
}
vec = 1:10


# iterating based on the index
for (i in 1:length(vec)) {
  print(vec[i] +  1)
}
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11

for loops

5_for_while_loops.R

  • This loop is used for iterating over a sequence.
for (value in sequence) {
  execute the command
}
vec = 1:10


# iterating based on
# the actual value
for (i in vec) {
  print(i + 1)
}
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11

while loops

5_for_while_loops.R

  • This loop executes the command as long as the logical statement is true.
while (test statement is true) {
execute the command
} 
a = 100

while (a > 95) {
  print(a)
  a = a - 1
}
[1] 100
[1] 99
[1] 98
[1] 97
[1] 96

Functions

6_functions.R

fn = function(arg1, arg2, ...) {
execute the command

return(final product)
}

fn = function(arg1=3, arg2="val2", ...) {
execute the command

return(final product)
}

Functions (cont.)

Let’s try to write a function that converts Fahrenheit to Celsius using the formula \[ C = \frac{5}{9}(F - 32)\]

fahrenheit_to_celsius = function(temp_f = 80) {
temp_c = (temp_f - 32) * 5 / 9
return(temp_c)
}

fahrenheit_to_celsius()
## [1] 26.66667
fahrenheit_to_celsius(temp_f = 20)
## [1] -6.666667
fahrenheit_to_celsius(20)
## [1] -6.666667

Functions (cont.)

Exercise

  • Write a function celsius_to_kelvin() that converts Celsius to Kelvin, using the formla \[K = C + 273.15\]
Answer
celsius_to_kelvin = function(temp_c) {
  temp_k = temp_c + 273.15
  return(temp_k)
}
  • How can we convert Fahrenheit to Kelvin? What is 100 Fahrenheit equal to in Kelvin?
Answer
celsius_to_kelvin(fahrenheit_to_celsius(100))
[1] 310.9278

apply()

6_functions.R

  • The apply() function performs a function on each row or column of a matrix/data frame.
v1 = c(22, 40, 13, 55, 48, 19, 42)
v2 = c(3, 16, 7, 20, 11, 5, 9)
mat = cbind(v1, v2)

mat
     v1 v2
[1,] 22  3
[2,] 40 16
[3,] 13  7
[4,] 55 20
[5,] 48 11
[6,] 19  5
[7,] 42  9
apply(mat, 2, function(x) sum(x))
##  v1  v2 
## 239  71

apply(mat, 2, sum)
##  v1  v2 
## 239  71

Exercise: apply()

  1. Write a function add_first_last() that computes the sum of a vector’s first and last elements.
Answer
add_first_last = function(v) {
  first_element = v[1]
  last_element = v[length(v)]
  return(first_element + last_element)
}
  1. Create a matrix named m of the form
     [,1] [,2] [,3]
[1,]    9    4    7
[2,]   14   47   74
[3,]    5    0    1
[4,]    6   10   19
Answer
m = rbind(c(9, 4, 7),
          c(14, 47, 74),
          c(5, 0, 1),
          c(6, 10, 19))

Exercise: apply() (cont.)

  1. Compute the sum of first and last elements for every row in m.
Answer
apply(m, 1, function(k) add_first_last(k))
## [1] 16 88  6 25
apply(m, 1, add_first_last)
## [1] 16 88  6 25